Pose Constraints for Consistent Self-supervised Monocular Depth and Ego-Motion
نویسندگان
چکیده
Self-supervised monocular depth estimation approaches suffer not only from scale ambiguity but also infer temporally inconsistent maps w.r.t. scale. While disambiguating during training is possible without some kind of ground truth supervision, having consistent predictions would make it to calculate once inference as a post-processing step and use over-time. With this goal, set temporal consistency losses that minimize pose inconsistencies over time are introduced. Evaluations show introducing these constraints reduces improves the baseline performance ego-motion prediction.
منابع مشابه
Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints
We present a novel approach for unsupervised learning of depth and ego-motion from monocular video. Unsupervised learning removes the need for separate supervisory signals (depth or ego-motion ground truth, or multi-view video). Prior work in unsupervised depth learning uses pixel-wise or gradient-based losses, which only consider pixels in small local neighborhoods. Our main contribution is to...
متن کاملSelf-Supervised Monocular Image Depth Learning and Confidence Estimation
Convolutional Neural Networks (CNNs) need large amounts of data with ground truth annotation, which is a challenging problem that has limited the development and fast deployment of CNNs for many computer vision tasks. We propose a novel framework for depth estimation from monocular images with corresponding confidence in a selfsupervised manner. A fully differential patch-based cost function is...
متن کاملDeMoN: Depth and Motion Network for Learning Monocular Stereo
Our network is a chain of encoder-decoder networks. Figures 12 and 13 explain the details of the two encoderdecoders used in the bootstrap and iterative net part. Fig. 14 gives implementation details for the refinement net. The encoder-decoders for the bootstrap and iterative net use additional inputs which come from previous predictions. Some of these inputs, like warped images or depth from o...
متن کاملDeMoN: Depth and Motion Network for Learning Monocular Stereo
Our network is a chain of encoder-decoder networks. Figures 15 and 16 explain the details of the two encoderdecoders used in the bootstrap and iterative net part. Fig. 17 gives implementation details for the refinement net. The encoder-decoders for the bootstrap and iterative net use additional inputs which come from previous predictions. Some of these inputs, like warped images or depth from o...
متن کاملDeMoN: Depth and Motion Network for Learning Monocular Stereo
Our network is a chain of encoder-decoder networks. Figures 15 and 16 explain the details of the two encoderdecoders used in the bootstrap and iterative net part. Fig. 17 gives implementation details for the refinement net. The encoder-decoders for the bootstrap and iterative net use additional inputs which come from previous predictions. Some of these inputs, like warped images or depth from o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Lecture Notes in Computer Science
سال: 2023
ISSN: ['1611-3349', '0302-9743']
DOI: https://doi.org/10.1007/978-3-031-31438-4_23